Exploring Weather Trends : Columbus, OH
Photo by Oz Seyrek on Unsplash
In this project, I will analyze local and global temperature data and compare the temperature trends my local city (Columbus, OH) to the overall global temperature trends. I will seek to answer the following questions:
The following section describes the steps taken to extract and wrangle the local and global temperature data.
The following SQL queries were used to extract the data from the database provided:
City-level Data:
SELECT *
FROM city_data
WHERE country = 'United States'
AND city = 'Columbus';
Global Data:
SELECT *
FROM global_data;
The results of the above queries were exported to CSVs.
First, the query-export CSVs were imported into RStudio. The year columns were then typecast into the Date type, and the annual average temperature columns were converted to a numeric type. Because a simple way to prepare data for plotting comparisons is to join the two data sets together, the city-level data was assigned a group variable where each value is ‘Columbus’, and the global data was assigned a group variable with value ‘Global’. The moving averages for each set were calculated as follows using the zoo library:
cbus_data_full <- cbus_data %>%
mutate(`5-Year MA` = zoo::rollmean(avg_temp, k = 5, fill = NA, align = 'right'),
`10-Year MA` = zoo::rollmean(avg_temp, k = 10, fill = NA, align = 'right'),
`50-Year MA` = zoo::rollmean(avg_temp, k = 50, fill = NA, align = 'right')
)
global_data_full <- global_temps %>%
mutate(`5-Year MA` = zoo::rollmean(avg_temp, k = 5, fill = NA, align = 'right'),
`10-Year MA` = zoo::rollmean(avg_temp, k = 10, fill = NA, align = 'right'),
`50-Year MA` = zoo::rollmean(avg_temp, k = 50, fill = NA, align = 'right')
)
For exploratory purposes, 5, 10, and 50 year moving averages were calculated. This method was verfied against the example dataset provided in the course. Finally, the two datasets were combined into one set using dplyr::bind_rows. Ultimagely, the 10-year moving average was selected for visualization; the 5-year moving average provided such a detailed view that the overall trends were difficult to assess, and the 50-year moving average cast such a wide aggregation that the subtle changes between decades were lost.